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Abstract. Computational studies of the interpretation of structure from motion 
examine the conditions under which three-dimensional structure can be recovered 
from motion in the image. The first part of this paper summarizes the main 
results obtained to date in these studies. The second part examines two issues: 
the robustness of the 3-D interpretation of perspective velocity fields, and the 3-D 
information contained in orthographic velocity fields. The two are related because, 
under local analysis, limitations on the interpretation of orthographic velocity fields 
also apply to perspective projection. The following results are established: 

• When the interpretation is applied locally, the 3-D interpretation of the 
perspective velocity field is unstable. 

• The orthographic velocity field determines the structure of the inducing object 
exactly up to a depth- scaling. K 

• For planar objects, the orthographic velocity field always admits two distinct 
solutions up to depth-scaling. 

• The. 3-D structure is determined uniquely by a "view and a half" of the 
orthographic velocity field. 
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1 Introduction 

When objects move in the environment, the images they cast upon our retinas 
undergo complex transformations. The human visual system can interpret these 
transformations to recover the three-dimensional (3-D) structure of the viewed 
objects and their motion in space. 

This capacity to interpret structure from motion has been demonstrated in a number 
of studies 1,2 . Its earliest systematic investigation was carried out by Wallach and 
OConnell 3 in the study of what they have termed "the kinetic depth effect". In 
their experiments, an unfamiliar object was rotated behind a translucent screen, 
and the shadow cast on the screen by a distant light source was observed from the 
other side of the serene. In most cases, the viewers were able to describe correctly 
the hidden object and its motion, even when each static shadow projection of the 
object was unrecognizable, and contained no three-dimensional information. 

The original kinetic depth experiments employed primarily wireframe objects which 
projected as sets of connected lines. Later studies 4,5 ' 2 established that 3-D structure 
can be perceived from displays consisting of unconnected elements in motion, and 
under both continuous and apparent motion conditions. 

Additional demonstrations of motion-based interpretation were provided by the 
remarkable experiments of Johansson 6 ' 7 . These demonstrations were created by 
filming human actors moving in the dark with small light sources attached to their 
main joints. Each actor was thus represented by up to 13 moving light dots. The 
resulting dynamic dot patterns created a vivid, three-dimensional impression of the 
actors and their motion. 

In this memo, I shall review the main results obtained to date in the computational 
study of the interpretation of structure from motion. These studies examine the 
problem from a theoretical standpoint in an attempt to attain two main goals. The 
first is what may be called the underlying computational theory of the task 8,9 . This 
theory tries to explain how the interpretation can be achieved in principle by any 
biological visual system or by a man-made device. The second computational goal 
is to develop and compare different schemes that can actually recover 3-D structure 
from motion. The study of these schemes, their relative merits and shortcomings, 
and the comparison of their performance with that of humans, can lead to a better 
understanding of the interpretation scheme embodied in the human visual system. 

In the next section, I shall analyze the computational problem, and describe a scheme 
for recovering structure from motion. Only a brief outline will be given, since a 
detailed description can be found elsewhere. 10 Section 3 will review recent alternative 
schemes and additional computational results obtained to date. Finally, Section 
4 will make some comparisons among the main different schemes. In particular, 
the recovery of structure from continuous velocity fields under orthographic and 
perspective projections will be examined. 




Figure 1 The interpretation of structure from motion. The dots comprising the two cylinders 
are projected on the screen (the outline of the cylinders in not shown in the actual presentation). 
The 3-D structure of the cylinders can be recovered from the motion of the dots across the screen 
(see Ullman, 1979). 
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2 The rigidity- based interpretation of structure from motion 

In this section, we shall assume that the motion is given as a sequence of discrete 
frames, each one depicting a collection of unconnected elements. Figure 1 shows 
an example in which the elements are lying on the surface of two coaxial, invisible 
cylinders. The 3-D coordinates of all the dots are stored in a computer's memory, 
and their projection on the frontal plane is computed and presented on a CRT 
screen. The imaginary cylinders are then rotated (up to about 10 degrees between 
frames), and their new projection is computed and displayed on the screen. Each 
single static view of the cylinders appears as a random collection of dots. However, 
when the changing projection is viewed in a movie-like fashion, the elements in 
motion across the screen are perceived as two cylinders whose shapes and angles of 
rotation are easily determined. 

How can this interpretation be achieved? The fundamental underlying problem is 
the ambiguity of the interpretation: there are many different motion patterns in 
space that could produce the same two-dimensional motion of the elements on the 
screen. To resolve this ambiguity, the interpretation scheme must incorporate some 
additional constraints that would rule out most of the possible interpretations and 
force a unique solution, which in most cases is also the correct one. 

A possible constraint, suggested originally by Wallach and O'Connell 3 is a rigidity 
constraint. That is, the preferred interpretation is the one in which the elements 
move together as a rigid object rather than a collection of elements moving 



independently in space. The suggested rigidity constraint raises, however, a number 
^*N, of problems. The first is the question of uniqueness: if the same 2D transformation 

of the elements can be produced by different 3-D objects, participating in different 
3-D movements, then rigidity must be rendered an insufficient constraint for the 
3-D interpretation task. A second problem is the possibility of false targets: if 
the elements participate in fact in a non-rigid motion in space, and if their 2D 
projoction happens to have a rigid interpretation, then this false rigid solution will 
be forced upon the elements. Finally, there is the multiple object problem: if the 
observed elements belong not to a single rigid object, but to distinct objects moving 
independently, then the collection as a whole will fail to have a rigid interpretation. 
A rigidity-based interpretation must, therefore, be applied somehow to relevant 
subcollections of the elements, and not to the entire scene at once. 

A detailed analysis of these problems can be found elsewhere 2 ' 10 , together with a 
discussion of possible implications to human motion perception. Here, I shall only 
sketch briefly the main computational results. It has been shown by Ullman and 
Fremlin (in the "structure from motion" theorem 2 ), that, given three distinct views 
of a moving object, it can be determined unambiguously whether they represent 
a single rigid object, and if they do, then the 3-D structure can be recovered 
uniquely. The object is defined here as a collection of identifiable points, and to 
obtain uniqueness, the object must contain at least four non-coplanar points. 
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Rigid objects in motion are thus determined uniquely on the basis of information 
that is local in both space and time. This result provides answers to the problems 
raised above. Uniqueness of the solution is guaranteed, provided that the moving 
object contains at least four non-coplanar points. The possibility of false targets 
is eliminated: it can be shown that the probability of four points that, in fact, 
do not belong to a single rigid object will happen to have a rigid interpretation 
is negligible. Finally, the locality of the interpretation suggests a solution to the 
multiple object problem. If the interpretation is restricted to local groups of about 
four elements each, then, due to the contiguity of objects, many of these groups will 
lie within a single object, and therefore their interpretation will not be affected by 
the additional objects. Given a scene containing several rigid objects in motion, a 
correct interpretation can therefore be obtained using the following scheme: divide 
the image into local groups of about four elements each, test each group for a unique 
rigid interpretation, and combine the results obtained for the different groups. 



3 Alternative schemes and additional results 

The analysis outlined in the previous section has been formulated in terms of 
distinct views, distinct identifiable points, and a parallel projection of the moving 
objects. In this section, I shall examine similar results obtained under somewhat 
different formulations. 
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3.1 Perspective vs. parallel projection 

The structure-from-motion theorem mentioned above assumed parallel or or- 
thographic projection. Unlike perspective projection, orthographic projection is 
formed by parallel light rays that are perpendicular to the image plane. Under such 
projection, the interpretation is unique up to a possible reflection about the image 
plane. This is an inherent ambiguity, since the projections of a rotating object, and 
its mirror image rotating about the same axis in the opposite direction, coincide 
under parallel projection. It is not surprising, therefore, that under orthographic 
projection, and in the lack of any additional source of 3-D information, human 
observers experience spontaneous depth- reversals of the objects, accompanied by 
reversal in the observed direction of rotation. 

The natural projection of 3-D objects onto the retina is a perspective rather 
than parallel projection (i.e., the projection rays are not parallel, but meet 
at a common point). Although it has been demonstrated (e.g., in the original 
kinetic depth experiments) that structure can be perceptually recovered from the 
parallel projection of moving objects, it is of interest to analyze the rigidity-based 
interpretation under perspective projection. Experiments with computer algorithms 
(some of which are described elsewhere 2 ) have suggested that seven or eight points 
in two perspective views are sufficient for a unique interpretation. Six points were 
sometimes sufficient for a unique interpretation, but not always. Longuet-Higgins 11 
has proposed an elegant algorithm for recovering 3-D structure from two perspective 
views of eight points. The computation required is particularly simple, involving 
primarily the solution of eight linear equations. This analysis did not provide, 
however, a uniqueness proof. 

Recently, Tsai and Huang 12 provided a comprehensive analysis of the uniqueness 
problem under two perspective views. Their results established that seven points 
guarantee a unique interpretation, provided that (i) they do not lie on a pair of 
planes, one of which passes through the origin, and (ii) they do not lie on a single 
cone containing the origin. This means that, except for a few cases where the points 
happen to form some special configurations in space, the interpretation will be 
unique. Tsai and Huang provided, in addition, simple algorithms for the recovery 
of the motion and structure parameters. 

8.2 On the meaning of "Computational Experiments" 

Those who associate "experiments" primarily with the testing of human subjects 
may wonder what is meant here by experiments with computer algorithms. The 
answer lies in the fact that it is often easier to find a solution to a problem (at 
least an approximate one) than to prove its existence and uniqueness. Suppose, for 
example, that it is conjectured that the three-dimensional structure of an object 
can be recovered from its changing projection by solving a certain system of linear 
equations. The coefficients in these equations will be variables that assume different 
values for different objects in motion. It may be difficult to show that the system 
has, in general, one solution. It is a straightforward procedure, however, to solve 
the equations for particular examples and verify the solution and its uniqueness. 



If we test the solution for a variety of examples, and consistently recover the 
f*\ correct solution, we have some reason to believe that the interpretation scheme is, 

in general, correct. It is still possible, however, that under certain conditions, the 
interpretation scheme will fail. Such conditions may be difficult to discover, and 
this is one reason why a comprehensive analytic analysis is more satisfaactory than 
computational experiments. 



3.3 The use of velocity information 

So far, the changing projection of the moving objects has been described in terms of 
discrete movie-like frames. It should be noted, however, that the formulation of the 
structure-from-motion theorem in terms of discrete views does not imply that the 
input image must be discrete rather than continuous. If a continuous motion extends 
long enough to contain three distinct views (and the qualification for "distinct" 
will depend on the accuracy of the imaging system), then it contains sufficient 
information for a unique interpretation. The theorem states this fact without 
excluding the possibility of implementing the computation using a continuous 
scheme. 

A distinctive property of the scheme outlined in Section 2 is that the information 
used was expressed entirely in terms of the positions of the elements at different 
times. An alternative formulation 13 ' 14 uses the velocities of the points as well as 
their positions. 

The input to the computation consists then of a single perspective view in which the 
positions of the moving elements and their velocities are specified. This velocity- 
based formulation can be viewed as the limiting case of two frames, as the time 
interval between them approaches zero. The uniqueness problem then takes the 
following form: given the position and velocity of N points in the image, determine 
whether or not they belong to a single moving object, and find the 3-D structure 
of the object and its motion in space. 

A preliminary theoretical problem is to determine the number N for which this 
recovery problem has a unique solution. Mathematically, this problem is still 
unresolved. A counting argument of equations and unknowns shows that at least 
five points would be necessary. A computer algorithm implemented by Prazdny 14 
suggests that five points might also be sufficient. Since the computer algorithm 
proved sensitive to errors in the input, especially when the viewed object was small, 
it seems that a robust recovery algorithm would require more than five points. This 
problem of robustness is examined again in more detail in Section 4.3. 
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8.4 The use of a continuous velocity field 

The scheme outline earlier relied on a small number of discrete elements. An 
alternative mathematical approach is to assume that the velocity of points in the 
image is known everywhere within a given region. The information is sometimes 
assumed to be known only locally; for example, the velocity field and its spatial 



derivatives are known at a single point. This formulation can be thought of as a 
limiting case of the discrete formulation, as the distance between points approaches 
zero. 

The most complete analysis to date of this problem has been provided by Longuet- 
Higgins and Prazdny 15 . Their analysis has established that the velocity field at a 
point has at most three different interpretations. More precisely, it showed that 
for nonplanar surfaces, given the velocity field and its first and second spatial 
derivatives at a point, there are at most three solutions to the surface orientation 
at that point. The analysis also provided a scheme for computing the solutions. 
The possible improvement of this result— in particular, a determination of whether 
the solution is in fact unique — poses an open question for future research. 

In an analysis of the orthographic continuous field, Hoffman 16 has shown that 3-D 
structure can be recovered uniquely (up to the unavoidable reflection about the 
image plane) if both the velocity and the acceleration fields are known within a 
region. 



8.5 Restricted motion and the interpretation of biological motion 

The discussion so far has examined the general case of unrestricted motion. 
Additional results have been obtained for situations where certain limitations are 
imposed upon the motion of the viewed objects. In this section, I shall review the 
f"**^ main results, together with their implication to problems in visual perception.. 

Results of considerable interest were obtained recently 17 , which have shown that 
for a rigid rod moving in a plane, its length and orientation in space are determined 
uniquely by three views (under parallel projection). A similar result was obtained for 
two rigid rods hinged together end-to-tail. When such a pairwise-rigid configuration 
moves in a plane, its 3-D structure and motion are uniquely determined by only 
two parallel projections. 

These results offer a powerful tool for the interpretation of biological motion 
(the kind of interpretation demonstrated by Johansson's experiments). Using 3-D 
measurements of motion in space, Hoffman has established that the arm and leg 
motion during locomotion often conforms to the planar motion constraint. Hoffman 
and Flinchbaugh 17 found that when the planarity-based scheme is applied to data 
obtained from Johannson-like experiments, a correct interpretation of the 3-D 
structure and motion of the moving light dots is often obtained. Their results 
suggest that the human visual system may incorporate processes that are capable 
of detecting pairs or triplets ,of feature points engaged in planar motion, and 
apply to them the planarity-based interpretation scheme. Figure 2 illustrates the 
application of the planarity-based interpretation scheme to six dots representing 
the human body in motion. The unconnected dots are shown in 2a and the rigid 
connection in 26. These connections and their 3-D structure can be established by 
the planarity-based scheme for all the rigid links that obey the planarity constraint. 

A similar scheme that can cope with somewhat less restricted motion was developed 
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(a) 



(b) 



elbow 



wrist 



knee 




shoulder 



ankle 



Figure 2 The interpretation of biological motion, (a) Six unconnected dots representing parts 
of the human body in motion. The motion of these dots can give rise to a vivid perception of a 
moving person, (b) Links made along the rigid connections. These links and their 3-D structure 
can be recovered by the planarity-based interpretation scheme (from Hoffman & Flinchbaugh, 
1982). 
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by Webb and Aggarwal. 18 They have considered the interpretation problem for 
an object assumed to rotate continuously about a fixed axis. (In general, the axis 
of rotation may change with time.) They have applied this scheme successfully 
to instances of biological motion, but no mathematical results regarding the 
information required for a unique interpretation have yet been obtained. 



S.6 Vertical rotation and horizontal translation: 
The computation of depth from stereoscopic disparity 

A recent result by Longuet-Higgins 19 established uniqueness for the case of an object 
that rotates about the vertical axis and translate in the horizontal plane. When 
the motion is restricted in this manner, three points are, in general, sufficient for a 
unique interpretation. The significance of this result lies in its possible applicability 
to the computation of depth from stereoscopic disparity. In the stereoscopic case, the 
object is in fact, fixed, and the different views are obtained by the different viewing 
positions of the two eyes. This problem is formally equivalent to the interpretation 
of structure from motion, given only two views. The computation of depth from 
stereoscopic disparities requires, in principle, knowledge of the direction of gaze of 
the two eyes. Longuet-Higgins' result suggests that this information can be obtained 
without reliance on non- visual information. Assuming that the horizontal meridians 
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of the two eyes coincide accurately, three points (non-meridional, and not all lying 
in the vertical plane) are sufficient for the recovery of the two directions of gaze. 

The main results regarding the unique interpretation of structure-from-motion 
discussed so far are summarized in Table 1 for general (upper half) and restricted 
(lower half) motion. 



TABLE 1 
MAIN RESULTS OF STUDIES ON THE RECOVERY OF STRUCTURE FROM MOTION 



Unrestricted Motion 



Discrete Points s Views Discrete Points, velocities Velocity Field 8 its Derlvotlves 

1 points in 3 orthographic views 5 points and their velocities Up to 3 solutions for general 
(Ullman 8 Fremlin, 1979) in a single perspective view motion (Longuet-Hlgglns 8 

(Prazdny, 1980) Prazdny, 1980) 

7 points in 2 perspective views Uniaue solution from velocity 

(Tsal s Huang, 1981) and acceleration under ortho- 

graphic projection (Hoffman, 
1980) 

Application: The recovery of 3-D structure from unrestricted motion, 



Restricted Motion 



3 orthographic views of two points in planar motion 

2 orthographic views of 3 points in a "hinged" configuration, planar motion (Hoffman 8 Flinchbaugh, 

1982) 
Application: Biological motion 

3 non-meridional points, vertical axis and horizontal translation (Longuet-Hlgglns, 1983) 
Application: The recovery of depth from stereo disparities 



/*">, 



Table 1: Uniqueness of the interpretation of structure from motion. The main 
results obtained to date are summarized for general motion (upper half) and 
restricted motion (lower half). 



4 Remarks concerning the different schemes and 
their relevance to perception 

The previous section examined the interpretation problem in a number of different 
formulations. Since a primary goal of these studies is to provide a computational 
basis for the study of perceptual phenomena, it is worthwhile to compare the 
relevance of the different schemes to human perception. Three questions will be 
examined briefly in this section. The first has to do with the applicability of 
mathematical results and algorithms to biological visual systems; the second has 



to do with the use of positions versus velocity fields; the third with parallel and 
f\ perspective projections. On the first two of these questions, I shall limit myself to 

a brief discussion of a few selected points. 



4-1 Mathematical algorithms and biological visual systems 

The results outlined in the preceding two sections were formulated in terms of 
mathematical propositions and algorithms. Two difficulties are sometimes raised 
regarding the applicability of such results to biological visual systems. The first 
is that, unlike an electronic computer, a biological system cannot be expected to 
solve the equations used in deriving the mathematical results. The second is that 
a biological system does not have access to the perfectly accurate data used in the 
mathematical abstraction. 

A comprehensive examination of the first objection would be beyond the immediate 
goals of. this review. The main answer lies, however, in the distinction between 
different levels of analysis: competence versus performance 20 or computational vs. 
algorithmic. 

The computational studies aim primarily at establishing principles such as rigidity 
or planarity that apply to any visual system facing the problem of interpreting 
structure from motion. Certain equations may be used in the derivation of such 
/—n, principles, but it does not follow, of course, that a system utilizing these principles 

would have to solve these equations in the process of interpreting structure from 
motion. 

The problem of accuracy in the measurements and computation is an important 
one. To be of practical value, the interpretation scheme must be robust: small errors 
in the measurement of position and velocity, for example, should not lead to a 
complete breakdown of the interpretation scheme. This means that computational 
studies should not only explore what is possible under idealized conditions, but 
also examine the effects of small perturbations and errors. An example of such an 
analysis will be given below. 



4-2 The use of positions vs. velocity fields 

All of the structure-from-motion schemes examined about used certain measurements 
as their "inputs" , and recovered the 3-D structure and motion in space as "output" . 
Different schemes used different inputs; some used the positions (in the image) of 
the moving elements at different times, while others used their retinal velocities. 

As noted earlier, the difference between the two formulations is not that 
velocity-based schemes are continuous and position-based ones discrete. A position- 
's based computation, for example, can use the continually changing positions, 
without necessarily requiring discrete "snapshots", but also without using velocity 
measurements. 
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It is not clear at present which formulation, the position-based or the velocity-based, 
is more directly relevant to human motion perception, since the measurements 
employed by the human visual system are not fully known. It may prove valuable, 
therefore, to explore in more detail different schemes that are based on different 
types of inputs. The comparison of such schemes with the performance of the 
human visual system could provide some clues regarding the types of measurements 
employed by the visual system in the interpretation of structure from motion. 



4-3 Parallel and perspective velocity fields 

The projection of the external environment available to our eyes is perspective rather 
than parallel. What, then, is the relevance of the parallel projection studies? There 
are two answers to this question. The first is that humans can recover structure 
from motion under orthographic projection. The second answer lies in the fact 
that under local analysis, perspective and parallel projections are nearly identical. 
(Local analysis means here that the surface patch under analysis is restricted to 
a small part of the visual field, so that the dimensions of the patch are small 
compared to its overall distance from the viewer.) If the interpretation scheme is to 
be robust and insensitive to small errors, it must also be capable of coping with the 
minor differences between the two projections, and either projection can therefore 
be assumed. One can, in fact, use the two kinds of projection as a test for stability. 
If a given interpretation scheme can operate under perspective projoection but fails 
under orthographic projection, it cannot be a stable local interpretation method. 

In the next section, I shall argue that (i) the recovery of 3-D structure from the 
instantaneous velocity field is impossible under orthographic projection, and (ii) for 
perspective projection, the recovery is unstable under local interpretation. 



5 The orthographic velocity field. 

In this section, we shall derive a complete characterization of the 3-D information 
that can be recovered from the instantaneous velocity field under orthographic 
projection. 

5.1 The depth scaling proposition 

Consider two surface patches such as Si and £2 hi Figure 3. The figure shows a 
cross-section of the surfaces from a side view. They are assumed to be rotationally 
symmetric with respect to rotations around the observer's line of sight so that Si, 
for example, is part of the surface of a sphere. The observer is assumed to view 
the objects along the Y axis, which is his line of sight, or depth axis. X is the 
/—\ • observer's horizontal axis, Z the vertical, and the X — Z plane is called the image 

plane. The objects are assumed to be fixed at one point, which is taken as the 
origin of the coordinate system. 
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Figure 3 The ambiguity of the orthographic velocity field. The surface S2 is obtained from Si 
via depth- scaling. Any motion of S\ can be "mimicked" by S 2 in such a way that their velocity 
fields will coincide. 
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Let 0\ and O2 denote the parallel projections of Si and 52, respectively (on the 
X — Y plane); Pi and P2 will denote the perspective projections of Si and S2; 
0[ and P[ (for i = 1, 2) will denote the parallel and perspective velocity fields, 
respectively; u(x, z) will denote the projected velocity in the horizontal direction 
and v[x, z) in the vertical. 

The surface 52 in Figure 3 was obtained from Si by the following transformation: 
if (x, y, z) is a point on S\, then (x,ky,z) is a point of S2, for a fixed constant k 
(k = 5 in Figure 3). Si is assumed to rotate by some angular velocity u>\ about the 
vertical axis, and S2 rotates about the same axis with angular velocity u>2 = wi/k. 
This transformation (including the rotations) will be called "depth-scaling", since 
the depth coordinate Y is scaled by a constant A;. 

Since the rotation is about the vertical Z axis, under orthographic projection, all 
the points will travel in the image plane parallel to the X axis. The projected 
velocity of a point on Si having spatial coordinates (xi, y\,z{) will be u\ — — t/iWi. 
The corresponding point on S2 (i.e., the point on S2 whose projection coincides 
with that of (21,2/1,21)) has a projected velocity 1*2 = — J/2^2- Since by definition, 
J/2 — %i> w 2 — w i/&> it follows that u\ = 112 for every image point, and therefore 
the velocity fields of Si and S2 coincide. 
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In this example, the objects were assumed to rotate about the vertical Z axis, but 
S2 can, in fact, "mimic" Si under arbitrary rotation. This claim can be established 
by noting that any rotation in space can be decomposed into two components: one 
is a rotation about an axis lying in the frontal X — Z plane, the other is a rotation 
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about the line of sight which is perpendicular to the X — Z plane. We shall call 
^"""^ the first component XZ-rotation and the second Y —rotation. A general rotation 

of Si can be decomposed, therefore, into an XZ-rotation with angular velocity u»i 
and a Y-rotation with angular velocity 0\. Let 52 rotate by a>2 = 0J\/k about the 
same axis in the X — Z plane, and assume 62 — Q\. It is not difficult to see that 
with this choice of L02, the orthographic velocity fields of S\ and S2 will coincide. 

Unlike the vertical rotation case, where the two objects had a common rotation 
axis, in this case (assuming u)\ 7^ 0, Q\ 7^ 0), the two objects would rotate about 
different spatial axes and still have identical velocity fields. We conclude that under 
orthographic projection, two objects can differ drastically in their shapes (e.g., in 
terms of their surface orientation and curvatures), axes of rotation and rotation 
speeds, and yet induce identical velocity fields. 

Since the constant k used in the definition of 52 can be chosen arbitrarily, the velocity 
fields admit not only two distinct interpretations, but an infinite family of surfaces 
for depth scaling by different factors k. (The definition of depth scaling includes 
the appropriate relation between the rotation components, a>2 = a>i/A;,02 = #i- It 
is also assumed that io\ 7^ 0.) It can be further shown that this family completely 
characterizes the set of confusable objects. We can summarize these claims in the 
following proposition: 

• The Depth Scaling Proposition. If a non-planar surface 5i is a possible 
rigid interpretation for a given orthographic velocity field, then 52 is another 
possible interpretation if and only if it is obtained from S\ via depth scaling. 

The proof of this proposition is given in Appendix 1. It serves to give a complete 
characterization of all possible interpretations of the orthographic velocity field. 
Its first implication is that this interpretation is always non-unique. The second 
implication is that, although the interpretation is non-unique, properties that are 
invariant under depth-scaling can be recovered from the orthographic velocity field. 
For example, the depth ratio yi/yj for two points with image coordinates (xi,Zi) 
and [xj,Zj) can be recovered uniquely from the velocity field; extremal points in y 
and inflection points can also be recovered. 

5.2 The orthographic velocity field of a planar object 

The depth scaling proposition stated above holds for non-planar surfaces. For planar 
objects, the ambiguity is doubled: the orthographic velocity field admits exactly 
two distinct solutions, each determined up to a depth scaling. In addition, it would 
be possible to determine from the velocity field whether or not the inducing object 
is planar. These properties of the orthographic velocity fields of planar objects are 
established in Appendix 2. 

5.S Unique structure from a "view and a half" 

f~\ As it turns out, the depth-scaling ambiguity of the solution can be resolved with the 

addition of a single view. More specifically, if the projected positions and velocities 
of five points in a general configuration (i.e., no four of which are coplanar) are 
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given at time t\, and the positions of the same five points are given at a later 
^"*' N time ^2) then the 3-D structure can be recovered uniquely (up to the unavoidable 

reflection ambiguity about the image plane). This "view-and-a-half" proposition is 
proven in Appendix 3. 

In summary, under orthographic projection, the interpretation of the velocity field 
is non-unique. For non-planar objects, if Si is a possible interpretation, then S2 
is another possible interpretation, if and only if, it is related to Si, via depth 
scaling. Only properties that are invariant under depth scaling can therefore be 
recovered from the orthographic velocity field. For planar objects, there are exactly 
two distinct interpretations up to depth scaling. The structure can be recovered 
uniquely for as few as five points in a general configuration if a single view is given 
in addition to the velocity field. A more detailed analysis, including the planar case, 
is given in the appendix. 



6 The perspective velocity field 

The perspective velocity field is, in a sense, richer in information than the or- 
thographic field. While the orthographic field admits infinitely many interpretations, 
in the perspective case, the velocity field, even in an arbitrarily small neighborhood, 
can have no more than three interpretations. 15 For a sufficiently small patch of 
^\ surface, perspective and orthographic projections are not very different, however, 

and one may suspect, therefore, that the local interpretation of perspective velocity 
fields in unstable. The argument is roughly as follows. Under orthographic projec- 
tion, two widely different objects, S\ and S2, can have similar, and even identical 
velocity fields, 0\ and 0' 2 , respectively. For a sufficiently large viewing distance, the 
perspective velocity fields, P\ and P' 2 become similar to 0\ and 0' 2 , respectively. 
Consequently, P\ and P' 2 are also closely similar, and therefore, slight errors in the 
measured velocity fields can have large effects on the interpreted 3-D shape. 

This argument can be made more precise. Let S\ and 52 in Figure 4 be two surfaces 
rotating about the Z axis. As before, the direction of gaze is along the Y axis. 
S2 is derived from Si in this case by extending the ray from the viewing point to 
each point on Si so that 2/2/2/1 = k for a given constant k. The surface S2 depends 
in this case on the viewing point: when the viewing distance increases, the shape 
of S2 in the perspective case approaches the orthographic depth scaling of Si by 
the constant k. (In Figure 4, it was assumed that the ratio of viewing distance to 
object size is such that the object occupies one degree of visual angle.) As before, 
we will choose u>2 = ui/k. Finally, let H denote the distance of the origin Q from 
the observer. 

The projected velocity of pi (a point on Si with y 7^ 0) is (ui,ui). This velocity 
will be measured as an angular velocity, i.e., the number of deg/sec a point travels 
f~\ . in the observer's field of view. The angular velocity of the corresponding point on 

S2 is (u2,V2). The claim is that when H is sufficiently large, the vectors (ui,«i) 
and (^2,^2) become arbitrarily close. That is, as H grows, the ratio between their 
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Figure 4 The instability of the local interpretation of perceptive velocity fields. Under local 
analysis, small errors in the measured velocity field can induce large errors in the interpreted 3-D 
structure. The" difference in the velocity fields induced by Sj and S2 will be less than 3.2% when 
the surface patches occupy one degree of visual angle, and about 6% at twice this size. 

magnitudes Ju\ -f- v\/\ju\ -f- v\ approaches 1, and their difference in direction, 
measured by (t>i/ui) — (^2/^2), approaches (assuming y 7^ in the region, except 
at the origin). 

The proof of this claim is straightforward, and will, therefore, not be detailed. It can 
be derived from the expressions given below for the speed ratio and the direction 
difference. These expressions follow directly from the definitions. The speed ratio 
(squared) is: 



u 



+ v\ [(H + k yi ) yi + xl/k} 2 + zlxl/k 2 {H + yi )* 



u\ + v\ 



[{H + </i)yi + x\f + z\x\ (H + kytf 



where 



x<i = X\ 



H + k Vl 
H + yi 



z 2 = Zl 



H + k yi 
ff + vi 



^*v 



The difference in direction is: 



«2 
U2 



Hi 

Ml 



zixilk 



Z1X1 



[H + fcyi)yi + x 2 2 /k {II + yi ) yi + x\ 
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It can be seen from these expressions that for any point with y\ 7^ 0, the speed 
/""S ratio approaches 1 and the difference in direction approaches as if grows. Under 

certain conditions (which will not be elaborated), this will also happen uniformly 
within a region. 

As a result, it is possible to construct drastically different objects whose velocity 
fields within a region are almost identical. That is, the differences in speed and 
direction at any given point within the region can be made arbitrarily small. As 
explained in the analysis of the orthographic case, this ambiguity is not restricted 
to rotation about the vertical axis, but can arise for arbitrary rotations as well. 

The surfaces in Figure 4 illustrate that this problem can be quite severe. When 
the viewing distance is such that the surfaces in Figure 4 a occupy one degree of 
visual angle, the differences in their perspective velocity fields within the entire one 
degree patch will not exceed 3.2%. At half the viewing distance, the maximum 
error about 6%. 



Concluding Remarks 

f ms -\ The comparisons discussed in the last sections can be combined with psychological 

experiments to gain further insight into the possible use of instantaneous velocity 
fields in the recovery of structure from motion by the human visual system. One 
can test, for example, whether shape parameters, such as surface orientation and 
curvature, can be recovered by the visual system for objects subtending one or 
two degrees of visual angle. Even a moderate success in this task would suggest 
that either our visual system measures velocities with high precision, or that the 
interpretation under these conditions does not rely on the instantaneous velocity 
field. The first of these alternatives does not seem attractive. Computational studies 
have indicated that the measurement of the velocity field is, in general, a difficult 
task, 21 ' 22 and it is probably unrealistic to expect these measurements to reach 
the level of precision required for the interpretation task. The ability to interpret 
correctly the 3-D structure of small objects can therefore provide evidence against 
the use of the instantaneous velocity field in this task. One may expect instead to 
find in the visual system, the capacity to integrate information over time periods 
that allow sufficient excursion of the moving object, rather than to base the 
interpretation on. instantaneous, velocity measurements. 



Acknowledgements: I thank E. Hildreth and E. Grimson for their help and 
^«*v comments. This work (with the exception of the "view and a half" proposition) 

appears in Human & Machine Vision, Beck & Rosenfeld, eds., Academic Press, in 
press. 
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APPENDICES 

Appendix 1: The depth scaling proposition 

This appendix uses the same notion as Section 4 above. As before, S\ and S2 are 
two rotating rigid surfaces. The rotation of S\ can be decomposed into a component 
with angular velocity uji (assumed to be non-zero) about an axis in the frontal 
X — Z plane (XZ-rotation), and a second component (F-rotation) with angular 
velocity 9\ about the line of sight Y. The corresponding components for 52 are W2 
and 9%. S2 will be called a depth scaling of Si if: 

• For every point (x,y, z) on Si, (x,ky,z) is a point on So for some constant 
ky£Q. 

• The rotations u>i and W2 are around the same axis and u>2 = wi/fc. 

• 9i = 2 . 



{S 



f^ 



The depth scaling proposition 

• If a non-planar surface, Si, is a possible rigid interpretation for a given 
orthographic velocity field, then S2 is another possible interpretation if, and 
only if, it is obtained from Si via depth scaling. 

Proof: We have seen in a previous sections that if S2 is obtained from Si via depth 
scaling, then their orthographic velocity fields coincide. The converse statement 
that remains to be shown is that if Si and S2 have identical velocity fields, then 
they are related via depth scaling. 

This property clearly holds in the vertical rotation case where Si and S2 both rotate 
about the Z axis. If the angular velocities of Si and S2 about the vertical axis are 
wi and W2 respectively, then the projected velocities at point (x, y) in the image are 
— wiyi and — w 2 j/2 respectively. This implies that for the velocity fields to coincide, 
the depth ratios 2/2/2/1 at any given point in the image must equal W1/W2. 

To show that the proposition holds under general rotation, we shall establish the 
following claim: if two rotating (non-planar) objects, Si and 6*2, have the same 
orthographic velocity field, then 9\ = $2 (where denotes, as before, the rotation 
component about the line of sight). 
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Let #i be the Y-rotation of the first object. By rotating the velocity field induced 
f~\ by Si by —6\ about the line of sight, the Y-rotation component of S\ is cancelled, 

and the only component that remains is the XZ-rotation about some axis in the 
X — Z plane. The resulting velocity field will have the property that all the velocity 
vectors will now be parallel to each other. We shall next show that — 9\ is the only 
Y-rotation which, when added to the velocity fields, results in a parallel velocity 
field. The implication will be that, given a velocity field, the Y-rotation component 
is uniquely determined, and hence, B\ — 9^. 

Without loss of generality, we can assume that in the resulting parallel velocity 
field, all the projected velocity vectors are in the direction of the X axis. This 
means that the image velocity component in the Z direction v(x, z) = at every 
image point (x,z), and the velocity field (u(x, z),v(x,z)) can therefore be described 
as (u{x,z),0). Let us now add to this field a Y-rotation with some angular speed, 
9. The combined velocity field will now be (u(x,z) — z6,x9). This will again be a 
parallel velocity field only if either 

u(x, z) — z9 = (Al) 

or 

c (A2) 



r. 



/-\ u(i, z) — z9 

for some constant c. In either case, it can be readily verified that the velocity field 
u(x, z) must have been induced by a planar surface. In the second case, for example, 
u(x, z) = (9x/c) -j- 9z and the original rotation of S\ was assumed to be about the 
Z axis. For rotation about the Z axis with angular velocity w, the velocity field is 
given by 

u(x, z) == —uy{x, z). (A3) 

It follows that 

9 
y = — — x z, (A4) 

fjJC U> 

which is the equation of a plane. 

We conclude that for a non-planar object, starting from a parallel velocity field, a 
Y-rotation by any amount will, destroy the parallelism of the field. It follows that, 
given the velocity field of S\ (which is also the velocity field of 52), there is one, 
and only one, Y-rotation, by — 6\, that will create a parallel velocity field. This 
rotation is defined by the velocity field itself, independent of the inducing object, 
and hence #i = 62. 

We can now add to both Si and S2 a rotation component — 9\ that will cancel 
rotation about the line of sight. Their velocity fields will still concide, but now both 
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Figure 5 The velocity field of a planar surface. Line a is perpendicular to the field direction, 
and the field is nullified along b. This field has two different interpretations. In one, o is the 
rotation axis and 6 is the tilt line. In the second interpretation, roles are switched: a is the tile 
line, and 6 is the projection of the rotation axis. 



objects rotate about a common axis in the image plane. Within loss of generality, 
this axis can be labeled the Z axis, and the general rotation can thereby be reduced 
to the case of vertical rotation. 



Appendix 2: The velocity fields of planar surfaces 

For a rotating planar surface, when the velocity field is parallel to the X axis, it 
will have the following form: 



u(x, z) = ax -f- pz 



(5). 



/*">. 



That is, the field depends linearly on x and z. This velocity field is illustrated in 
Figure 5. Two special lines are marked in this figure: line a (coinciding in this case 
with the z axis), which is perpendicular to the field direction, and line b (the dotted 
line in Figure 5) along which the field is nullified. 



19 



f\. 



In describing the 3-D interpretation of this field, we shall make use of the folio-wing 
' ■ definition: a plane that is not parallel to X — Z must intersect it along a straight 

line, which will be called the "tilt line" of the plane. We then have the following 
proposition: 

• There are exactly two planar interpretations of the velocity field in Figure 5. 
In one interpretation, a is the axis of rotation, and b is the tilt line. In the 
second interpretation, the roles are switched: a is the tilt line, and b is the 
projection of the rotation axis (the axis itself lies on the planar surface). Each 
of these interpretations is determined as before up to depth scaling. 

The proof of this proposition is rather straightforward, and will therefore be sketched 
briefly. Let (Q x ,Q, y ,n z ) be the angular velocity vector of the rotating object. The 
velocity of a point {x,y,z) is given by the vector product (D, x ,Q. y ,Cl z ) X (x,y,z). 
The velocity field is required to have the form: u(x, z) = ax -f- fiz, v(x, z) = 0. 
Therefore, *- 

n y z — {l g y = ax + Pz 

~ (A6) 

U x y — UyX = 0. 

One solution to these equations arises when £l y = 0. This implies that {l x = (if y 
is not identically 0) and y = —(ax -f- j3y)/Q z . This solution corresponds to a plane 
whose tilt line is b, rotating about the vertical axis. 

If Qy 7^ 0, then Q x 7^ and y = (Q y /Q x )x. The surface must therefore be a plane 
with its tilt line coinciding with the Z axis. 

In conclusion, the following statements summarize the analysis of the information 
content of the orthographic velocity field, for both planar and non-planar objects. 
Given the orthographic velocity field: 

• It is possible to determine whether the inducing object is planar. 

If it is non-planar, then the interpretation is determined exactly up to depth 
scaling. 

If it is planar, then there are two distinct solutions up to depth scaling. 



Appendix 3: Unique 3-D structure from a 'View and a half 

As we have seen, a single view of the orthographic velocity field is insufficient to 
recover the 3-D structure of inducing objects. In this section, we shall see that with 
additional, "half- a- view", the 3-D structure of an object containing at least five 
points in a general configuration is uniquely determined. 

Five points in space are arranged in a general configuration if no four of them are 
coplanar. It is assumed here that two views of the five points in motion are given. 

The first View gives the projected positions (xj,Zj), and the velocities (ui,Vi), 
i = 0. . ., 4 of the five points at time t. The second view gives the position of the 
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points (xi,zi) at some later time t'. We shall assume that between t and t', the 
(T*S rotation of the object was less than 180 degrees. 

The "view- and- a-half" proposition: 

• Given two orthographic views of five points in general configuration, the first 
view specifying the position and velocities of the points, the second their 
position only, the 3-D structure of the five points is uniquely determined. 

Proof: We have seen that the 3-D structure of non-planar objects can be recovered 
up to depth-scaling from the velocity field alone. The same result holds when the 
velocity field is known for five isolated points, rather than within a continuous 
region. What remains to be shown, therefore, is that the depth-scaling ambiguity 
can be removed with the addition of a single view at a later time t' . 

The proof is divided into two parts. In the first part, the object is assumed to rotate 
about the vertical Z axis. The second part extends the results to general motion. 



Rotation about the vertical axis 

Without loss of generality, it can be assumed that the object is fixed at one point 
(x ,y ,z ), which will serve as the coordinate system's origin. For a rotation about 
/•■% the vertical Z axis, the Z coordinates of all the points remain unaltered, and the 

x coordinates are transformed according to: 

x\ = Xicosa -f- yiksina i == 1. . .4 (A7) 

where ex is the unknown angle of rotation about the vertical axis between viewer V\ 
and Vi . Both X{ and t/j are already known while k is the still unknown depth-scaling 
factor. We thus obtain four linear equations (for 1 < i < 4) in two unknowns, cosa 
and ksina. Two independent equations are sufficient for a unique solution. Two 
equations of the form 
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x\ = Xicosa -\- yiksina 



x'- = Xjcosa -f- yjksina 

will not be independent if z;yy = xyt/j. We shall, therefore, fail to obtain a pair of 
independent equations only if x±yj = Xjyi for every 1 < i, j < 4. This condition 
implies that all of the points are lying on a common plane passing through the 
origin in contradiction to the general configuration assumption. A unique solution 
for cosa, ksina is therefore guaranteed, and this solves a and k up to a sign (which 
is the inherent ambiguity with respect to reflection about the image plane). 
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Figure 6 (a) The lines P\ — P\ and Pi — P' 2 are parallel to the X axis, (b) The points P\ and 
P'. 2 in the figure have been rotated together about the origin by an angle a. Their new positions 
are P'[ and P' 2 ' respectively. The lines Pi — P" and Pi — P' 2 ' are parallel to each other. 
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Uniqueness under general motion 

As has been noted in earlier sections, general rotation can be decomposed into 
two components: a rotation about some axis (the XZ axis) lying in the image 
plane, following by a rotation (y-rotation) about the line of sight. To extend the 
uniqueness result to general motion, we shall prove next that for five points in a 
general configuration, the I'-rotation and the XZ axis can be recovered uniquely 
from the two views. Consequently, it is always possible to "undo" the Y"-rotation 
and re-label the XZ axis as the new Z axis, thereby reducing the motion between 
the two views to rotation about the vertical axis. The remainder of the proof is 
divided into two lemmas. 

Lemma 1: Let P\, P2, P\ and P' 2 be four points in the X — Y image plane, such 
that the lines Pi — P[ and Pi — P' 2 are both parallel to the X axis (figure 6a). Let 
P' x and P' 2 now rotate together in the plane about the origin by an angle a, and 
denote by P", P'^, the new position of P[, P' 2 , respectively. Question: is there an 
angle a such that the new lines Pi — P'[ and P2 — P\ will again be parallel to each 
other? (See fig 6.) 

Claim: Under the following two conditions 

(i) {P\ } P 2 ) is not a reflection of (Pi, P2) about the Z axis; 

(ii) at least one of the lines Pi — P2, P\ — P' 2 does not pass through the origin; 
there exists exactly one such angle a. 
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Proof: Let the coordinates of P* be (i», ^) of P^ be (x^) and of P? ««?). 



x'l — x^cosa — ^sina 



z" = x[sina + ^cosa (A8) 

Following the rotation, the lines (Pi — P'[) and (P 2 — P") are parallel, therefore 
their slopes coincide: 



*1 — z" _ fg — 4 

II — x" X2 — x' 2 ' 



(A9) 



(There is also the possibility that x x = x'{ and x 2 = x 2 and therefore the 
denominators vanish. We shall see, however, that in this case, there is still a unique 
solution for a.) 

Substituting for x\ and z" from (A8) yields the following. (Note that z\ = z[, 
z 2 = 4.) 



z\ — x\sina — z\cos<x _ 22 — ij«'na — z^cosa 
x\ — x'^cosa -j- zisz'no: X2 — x' 2 cosa -|- z^sinot 

which reduces to the form 



(A10) 



asina + 6cosa = 6 (^411) 

where 



a = xix' 2 — x\x<i 



b = Xi^2 + ^1^2 — Z\X2 — z\x\. 
If a and b are not both zero, (All) has exactly one solution, given by: 

2ab 



since 



a 2 -\-b 2 



b 2 -a 2 



Cosa ==^T2- (^2) 

If a = and 6 = 0, then (All) provides two equations in x\ and x' 2 . If these 
equations are independent, then their solution is x[ = x\, x' 2 = — X2, in violation 
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of condition (i) of the lemma. The equations are dependent when 21/2:2 = ^1/^2 = 
/""■"N z'i/z'2 which violates assumption (ii). 

Finally, if there is an angle a which makes the denominators in (A9) equal to zero, 
this angle is still a solution, and the only solution to the equation is (All). 

Let us consider next two frames of an object with five points (in general configuration) 
rotating about the Z axis. Let P{ and P[ for i — 0. . ., 4 denote the first and second 
frames, respectively, without loss of generality P = P' . As in lemma 1, the lines 
Pi — P[, i = 1. . ., 4 are all parallel to the X axis. The points P\ are now rotated 
about the origin, and their new positions are denoted by P". 

Lemma 2: 

For every < a < 2tt, the lines Pi — P" will no longer be all parallel to one 
another. 

Proof: Suppose that an angle a exists such that Pi — P", i = 1. . .4 are all parallel 
to one another. Consider the triplet of points (P' ,P\,P'^) (where P' = P is the 
origin, and i = 2, 3, 4). From lemma 1, such a triplet has a single angle c^ that 
would make the line Pi — P" parallel to Pi — P". To make all the lines Pi — P", 
i — 2. . .4 parallel simultaneously, 0:2 = 0:3 = 0:4 = a. From (All), 

a i= l-c ££ a =c 

bi sma 

where c is constant, (since 7^ implies that bi j^ 0, unless both a» and bi equal 
zero.) We therefore obtain that if bi 7^ 0, 

a>i x i x i x-^Xi . . 

— = — p- 5 i T = c. (A14) 

bi x\Zi -+- x\zi — z\%i — z\x\ 

We now make use of the fact that the points (x\, y'i) are obtained from (xufi) via 
rotation by some angle 6 7^ about the Z axis, namely 

x'i = XiCosQ — yisinO. (A15) 

Equation (A14) now becomes 

x\(xiCosQ — yisinO) — Xix\ = 

cx\Zi 4- cx\zi — cz\Xi — czi(xicos0 — yisinO). (A16) 

a**^ Equation (A16) holds for i = 2, 3, 4, including the points (if any) for which 

ai — bi = 0, in which case it is satisfied for every value of c. Note that x\, y^, z\ do 
not appear in (A16), which therefore reduces to the form: 

24 



O Ax i +By i + Cz i = Q {All) 

where 

A = xicosQ — x\ -f- c^i -f- CZ]COS# 
I? = — x\sin6 — cz\sin9 
C = — c%\ — cx\. 

Equation (A17) is an equation of a plane passing through the origin. All of the 
points (xi,yi,Zi) for i = 2,3,4 must obey (A17), and are, therefore, coplanar, in 
contradition to the general configuration assumption. It can be assumed that A, 
B and C are not all zero. Examine, for instance, the coefficient B, which depends 
only on the "reference point" [x\, z{). If B = 0, we will choose a different point as 
the reference point (a;, zi). B cannot vanish for all points, since such a condition 
would imply 

B = — X{sin0 — czisinO = 
for i = l.. .4, which is again, an equation of a plane. 
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Conclusion: Given two orthographic projections of an object in a general 
configuration, the XZ axis and the ^-rotation are uniquely determined. If a 
denotes the ^-rotation between the two frames, then rotating the second frame 
by — a would make all the lines (xi,Zi) — {x'^z'A parallel to one another and the 
XY axis would then be perpendicular to these lines. Lemma 2 established that this 
angle a is unique, and hence, the Y-rotation and XY axis are determined uniquely 
by just two frames. 
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